Basque Speecon-like and Basque SpeechDat MDB-600: speech databases for the development of ASR technology for Basque

نویسندگان

Igor Odriozola

Inma Hernáez

M. Inés Torres

Luis Javier Rodríguez-Fuentes

Mikel Peñagarikano

Eva Navas

چکیده

This paper introduces two databases specifically designed for the development of ASR technology for the Basque language: the Basque Speecon-like database and the Basque SpeechDat MDB-600 database. The former was recorded in an office environment according to the Speecon specifications, whereas the later was recorded through mobile telephones according to the SpeechDat specifications. Both databases were created under an initiative that the Basque Government started in 2005, a program called ADITU, which aimed at developing speech technologies for Basque. The databases belong to the Basque Government. A comprehensive description of both databases is provided in this work, highlighting the differences with regard to their corresponding standard specifications. The paper also presents several initial experimental results for both databases with the purpose of validating their usefulness for the development of speech recognition technology. Several applications already developed with the Basque Speecon-like database are also described. Authors aim to make these databases widely known to the community as well, and foster their use by other groups.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-Lingual Approaches: The Basque Case

Cross-lingual speech recognition could be relevant for Multilingual Automatic Speech Recognition (ASR) systems which work with under-resourced languages and appropriately equipped languages. In the Basque Country, the interest on Multilingual Automatic Speech Recognition systems comes from the fact that there are three official languages in use (Basque, Spanish, and French). . Multilingual Basq...

متن کامل

Semantic speech recognition in the Basque context Part I: cross-lingual approaches

This work, divided into Part I and II, describes the development of GorUP a Semantic Speech Recognition System in the Basque context. Part I analyses crosslingual approaches oriented to under-resourced languages and Part II the development of the Language Identification system. During the development, data optimization methods and Soft Computing methodologies oriented to complex environment are...

متن کامل

Language Identification for Under-Resourced Languages in the Basque Context

Automatic Speech Recognition (ASR) is a broad research area that absorbs many efforts from the research community. The interest on Multilingual Systems arouses in the Basque Country because there are three official languages (Basque, Spanish, and French), and there is much linguistic interaction among them, even if Basque has very different roots than the other two languages. The development of...

متن کامل

The basque speech_dat (II) database: a description and first test recognition results

In this work we present a telephone speech database for Basque, compliant with the guidelines of the Speechdat project. The database contains 1060 calls from the fixed telephone network. We first describe the main aspects of the database design. We also present the recognition results using the database and a set of procedures following the language independent reference recogniser commonly nam...

متن کامل

Versatile Speech Databases for High Quality Synthesis for Basque

This paper presents three new speech databases for standard Basque. They are designed primarily for corpus-based synthesis but each database has its specific purpose: 1) AhoSyn: high quality speech synthesis (recorded also in Spanish), 2) AhoSpeakers: voice conversion and 3) AhoEmo3: emotional speech synthesis. The whole corpus design and the recording process are described with detail. Once th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Basque Speecon-like and Basque SpeechDat MDB-600: speech databases for the development of ASR technology for Basque

نویسندگان

چکیده

منابع مشابه

Cross-Lingual Approaches: The Basque Case

Semantic speech recognition in the Basque context Part I: cross-lingual approaches

Language Identification for Under-Resourced Languages in the Basque Context

The basque speech_dat (II) database: a description and first test recognition results

Versatile Speech Databases for High Quality Synthesis for Basque

عنوان ژورنال:

اشتراک گذاری